[Models] add fleet model fallback#7732
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 8/10 通过
2 失败详情🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题(置信度: 中)分析器: ci_analyze_unittest_fastdeploy | 错误类型: 不稳定问题 | 置信度: 中 失败用例:
关键日志:
修复建议:
关联变更: PR 未修改 🔴 Approval — 需要 Approval该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。请通过人工审批。 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7732 +/- ##
==========================================
Coverage ? 67.72%
==========================================
Files ? 468
Lines ? 65509
Branches ? 10067
==========================================
Hits ? 44365
Misses ? 18299
Partials ? 2845
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-01 14:34:00
📋 Review 摘要
PR 概述:新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过替换 PaddleFleet TransformerLayer 中的 core_attention 为 FastDeploy Attention 实现 KV Cache 复用。
变更范围:model_executor/models/、config.py、engine/args_utils.py、graph_optimization/decorator.py、scripts/、tests/
影响面 Tag:[Models] [FDConfig] [Engine] [Graph Optimization] [CI]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/model_executor/graph_optimization/decorator.py:68 |
graph_opt_backend 不接受位置参数,*args 转发将导致启用图优化时崩溃 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | pip 命令字符串拼接缺少空格 | ✅ 已修复 |
| F2 | params_dtype 硬编码 bfloat16 |
✅ 已修复(现使用 self.model_config.dtype or "bfloat16") |
| F3 | 同 F1 | ✅ 已修复 |
| F4 | help 文本隐式拼接缺少空格 | |
| F5 | PretrainedModel import 改为内部路径 |
|
| F6 | 同 F1 | ✅ 已修复 |
| F7 | layer_number 1-indexed vs Attention 0-indexed |
|
| F8 | load_weights 缺少日志 |
✅ 已修复 |
| F9 | 引用不存在的 test 文件 | ✅ 已修复 |
📝 PR 规范检查
PR 描述结构合规,但 Checklist 勾选状态存在不一致:
[ ] Add unit tests:PR 已新增tests/model_executor_fallback/test_fallback_fleet_model.py,应改为[x][x] Provide accuracy results:Accuracy Tests 段填写 N/A,应改为[ ](括号内注明原因即可)
标题建议(可直接复制):
[Models] Add PaddleFleet model fallback backend
PR 描述建议(点击展开,可直接复制)
## Motivation
新增 PaddleFleet 作为模型推理后端(`--model-impl paddlefleet`),通过将 PaddleFleet TransformerLayer 中的 `core_attention` 替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。
## Modifications
- `config.py`: 新增 `paddlefleet` 到 `ModelImpl` 类型定义
- `engine/args_utils.py`: 支持 `--model-impl paddlefleet` CLI 参数,并补充校验逻辑
- `worker/worker_process.py`: 同步更新 `--model-impl` choices
- `model_executor/models/paddleformers/base_fleet.py`: 新增 `PaddleFleetModelBase` 基类、`FastDeployAttention` 层及 `patch_paddlefleet_core_attention` 替换函数
- `model_executor/models/paddleformers/__init__.py`: 注册 `PaddleFleetForCausalLM` 模型类
- `model_executor/graph_optimization/decorator.py`: 修复 `__call__` 支持位置参数(`*args`)
- `scripts/coverage_run.sh`: 新增 `isolated` 测试分类,将 fleet 相关测试置于最后运行
- `tests/model_executor_fallback/`: 新增 `conftest.py` 和 `test_fallback_fleet_model.py`
## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model /path/to/model \
--model-impl paddlefleet
```
## Accuracy Tests
N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据;后续 PR 将补充对齐结果)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
整体设计合理,通过 monkey-patch core_attention 实现对 PaddleFleet 模型的 KV Cache 复用。历史 Findings 中 5/9 已修复。建议优先处理 F4(help 文本空格)和 F7(layer_id 偏移)两个遗留问题,以及本轮发现的 decorator *args 兼容性问题。
| return self.forward(*args, **kwargs) | ||
|
|
||
| return self.graph_opt_backend(**kwargs) | ||
| return self.graph_opt_backend(*args, **kwargs) |
There was a problem hiding this comment.
🟡 建议 graph_opt_backend.__call__ 仅接受 **kwargs(GraphOptBackend.__call__(self, **kwargs)),此处转发 *args 会导致在 use_graph_opt=True 时抛出 TypeError。
当前 PaddleFleet 模型已应用 @support_graph_optimization 装饰器,若用户配置开启图优化,调用链将触发此路径。
建议修复方式:
def __call__(self, *args, **kwargs):
"""Decorator model.__call__() func"""
if not self.use_graph_opt:
return self.forward(*args, **kwargs)
# graph_opt_backend 仅支持 kwargs
return self.graph_opt_backend(**kwargs)或者在 GraphOptBackend.__call__ 中同步支持 *args。
Motivation
新增 PaddleFleet 作为模型推理后端(
--model-impl paddlefleet),通过将 PaddleFleet TransformerLayer 中的core_attention替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。Modifications
config.py: 新增paddlefleet到ModelImpl类型定义engine/args_utils.py: 支持--model-impl paddlefleetCLI 参数,并补充校验逻辑model_executor/models/paddleformers/base_fleet.py: 新增PaddleFleetModelBase基类、FastDeployAttention层及patch_paddlefleet_core_attention替换函数model_executor/models/paddleformers/__init__.py: 注册PaddleFleetForCausalLM模型类Usage or Command
python -m fastdeploy.entrypoints.openai.api_server \ --model /path/to/model \ --model-impl paddlefleetAccuracy Tests
N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据)
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.